home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98c.txt
/
000008_icon-group-sender _Thu Sep 10 16:54:52 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
3KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id QAA04917
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Thu, 10 Sep 1998 16:54:52 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA31445; Thu, 10 Sep 1998 16:54:25 -0700
To: icon-group@optima.CS.Arizona.EDU
Date: 10 Sep 1998 20:00:24 GMT
From: jeffery@cs.utsa.edu (Clinton Jeffery)
Message-Id: <6t9b4o$8rs$1@ringer.cs.utsa.edu>
Organization: The University of Texas at San Antonio
Sender: icon-group-request@optima.CS.Arizona.EDU
References: <35F723CF.76B3CC97@Japan.NCR.COM>
Reply-To: jeffery@cs.utsa.edu
Subject: Re: Unicode support or support for non-Ascii based character manipulation?
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Eric Hildum (Eric.Hildum@Japan.NCR.COM) wrote (and I paraphrase/edited):
: Icon ... supporting only ASCII makes it less useful for non-English language
: With Unicode... it should be possible to begin including support for
: non-English and non alphabetic languages.
: Has anyone thought about this yet? What does string and pattern matching
: mean in, for example, Japanese?
1. Other folks have been thinking about it, especially Icon users in Asia.
For example, a Chinese version of Icon has been done by researchers in China.
2. Going to Unicode might not be *that* difficult, but I think Unicode isn't
really as widely adopted as you might suggest. Many people seem to be using
mixed 8/16-bit strings.
3. The semantics of string and pattern matching are no different in Japanese
than in English. There is nothing specific to language or grammar in the Icon
string and pattern matching repertoire. Of course, when the character set
changes the actual code needs to change...
4. Let's look at the current situation for mixed-character sets. I am not
sure how Chinese Icon stands on these, but consider plain-old Windows Icon.
Divide functionality as follows:
non-alphabetic output: Windows Icon already can do this
non-alphabetic input: we have known bugs in the input processing
of these, either in Windows Icon or the IPL "vidgets" code.
non-alphabetic string scanning: not supported, but could be
implemented as Icon Program Library procedures. Even
Unicode string semantics could be implemented as library
procedures on top of (even length!) Icon strings.
We don't really need much additional infrastructure. Some folks in the user
community could coordinate the library procedures to do this as an
interesting project. We do also need someone who can compile Icon from its
C code and debug I/O problems on a non-alphabetic platform at this point.
--
Clint Jeffery, jeffery@cs.utsa.edu
Division of Computer Science, The University of Texas at San Antonio
Research http://www.cs.utsa.edu/research/plss.html